R3 Q4: Decreasing Hazards at Older Ages - Censoring Bias, Not Competing Risks¶
Reviewer Concern¶
Referee #3, Comment #4: "It's unclear to me whether and how well the model actually accounts for competing risks, such as death, emigration, and other strong competitors. This can also be caused by diagnostic hierarchy. What scares me are the reported hazards (e.g. figures S6-8), which seem to decrease for very old individuals, which can be interpreted as decreased risks. This looks like a competing risk issue."
Our Response¶
We appreciate the reviewer's concern about decreasing hazards at older ages. However, our analysis demonstrates that this pattern was not due to competing risks, but rather censoring bias in the original model. We have addressed this issue by implementing a corrected E matrix that accounts for realistic censoring times.
Key Points:¶
- The Problem: The original model (trained with enrollment E) showed artificial decreases in disease risk at older ages (after ~65-70 years)
- The Root Cause: This was due to censoring bias - the E matrix assumed all patients were followed until age 81, but many were actually censored earlier
- The Solution: We created a corrected E matrix that caps event times at each patient's actual maximum follow-up age (
max_censor - 30) - The Result: The corrected model shows realistic, continuously increasing disease risk with age, as expected biologically
This Notebook Shows:¶
- Censoring patterns across cohorts (UKB, AOU, MGB) - entry ages, exit ages, follow-up duration
- Old vs New prevalence - demonstrating the artificial U-shaped pattern in the old model
- Old vs New phi (disease signatures) - showing how learned patterns match corrected prevalence
- Reference to Figure 2 - signature patterns that show the problematic declining trends in the old model
1. Understanding Censoring Patterns Across Cohorts¶
First, we visualize the distribution of entry ages, exit ages, and follow-up duration for each cohort. This demonstrates the variable censoring times that must be accounted for in the model.
import torch
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
# Set style
sns.set_style("whitegrid")
plt.rcParams['figure.dpi'] = 300
%matplotlib inline
print("Setup complete")
Setup complete
# ============================================================================
# UKB: Load Censor Data
# ============================================================================
print("\n" + "="*60)
print("UKB: Loading censor data...")
print("="*60)
censor_path_ukb = Path('/Users/sarahurbut/Library/CloudStorage/Dropbox-Personal/data_for_running/censor_info.csv')
censor_df_ukb = pd.read_csv(censor_path_ukb)
# Get entry age (enrollment age)
if 'age' in censor_df_ukb.columns:
entry_age_ukb = censor_df_ukb['age'].values
elif 'enrollment_age' in censor_df_ukb.columns:
entry_age_ukb = censor_df_ukb['enrollment_age'].values
else:
raise ValueError("Could not find entry age column")
# Get exit age (max follow-up age)
if 'max_censor' in censor_df_ukb.columns:
exit_age_ukb = censor_df_ukb['max_censor'].values
else:
raise ValueError("Could not find exit age column")
# Calculate follow-up duration
followup_duration_ukb = exit_age_ukb - entry_age_ukb
print(f"UKB: Loaded {len(censor_df_ukb)} patients")
print(f" Entry age: min={entry_age_ukb.min():.1f}, max={entry_age_ukb.max():.1f}, mean={entry_age_ukb.mean():.1f}")
print(f" Exit age: min={exit_age_ukb.min():.1f}, max={exit_age_ukb.max():.1f}, mean={exit_age_ukb.mean():.1f}")
print(f" Follow-up: min={followup_duration_ukb.min():.1f}, max={followup_duration_ukb.max():.1f}, mean={followup_duration_ukb.mean():.1f}")
============================================================ UKB: Loading censor data... ============================================================ UKB: Loaded 407878 patients Entry age: min=37.0, max=73.0, mean=57.3 Exit age: min=51.0, max=81.0, mean=71.7 Follow-up: min=8.0, max=17.0, mean=14.4
%run /Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/create_censor_distribution_figure
Loading censor data from: /Users/sarahurbut/Library/CloudStorage/Dropbox-Personal/data_for_running/censor_info.csv Columns: ['identifier', 'age', 'sex', 'f.22009.0.1', 'f.22009.0.2', 'f.22009.0.3', 'f.22009.0.4', 'f.22009.0.5', 'f.22009.0.6', 'f.22009.0.7', 'f.22009.0.8', 'f.22009.0.9', 'f.22009.0.10', 'f.53.0.0', 'follow_up_days', 'follow_up_years', 'censor_age', 'enrollment_timepoint', 'max_censor', 'max_censor_py'] Loaded 407878 patients Entry age: min=37.0, max=73.0, mean=57.3 Exit age: min=51.0, max=81.0, mean=71.7 Follow-up duration: min=8.0, max=17.0, mean=14.4 Saving figure to: /Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/figures/censor_age_distributions.png ✓ Figure saved successfully! ============================================================ SUMMARY STATISTICS ============================================================ Entry Age: Mean: 57.30 years Median: 59.00 years Range: 37 - 73 years Exit Age: Mean: 71.65 years Median: 73.00 years Range: 51 - 81 years Follow-up Duration: Mean: 14.35 years Median: 14.00 years Range: 8.0 - 17.0 years ============================================================
<Figure size 1920x1440 with 0 Axes>
2. The Problem: Artificial Decreasing Hazards in the Old Model¶
The original model (trained with enrollment E) showed artificial decreases in disease risk at older ages. This is visible in:
- Figure 2 signature patterns (see
Figure2_Population_Level_Patterns.ipynb) - many signatures show declining trends after age 65-70 - Prevalence curves - the old prevalence shows U-shaped patterns
- Learned phi patterns - the model learned these artificial patterns
Why This Happened:¶
The original E matrix assumed all patients were followed until age 81, regardless of their actual follow-up time. This created censoring bias:
- Patients with longer follow-up (who would have higher disease risk at older ages) were censored earlier in reality
- The model saw fewer events at older ages, leading to artificially low prevalence
- This created the U-shaped or declining pattern that concerned the reviewer
The Solution: Corrected E Matrix¶
We created a corrected E matrix that:
- Caps each patient's event times at their actual maximum follow-up age:
max_censor - 30 - Uses at-risk filtering when computing prevalence (only includes patients who could have been observed at each age)
- Results in realistic, continuously increasing disease risk with age
# ============================================================================
# Compare Old vs New Prevalence and Phi for Key Diseases
# This code is adapted from R1_Clinical_Utility_Dynamic_Risk_Updating.ipynb Cell 10
# ============================================================================
from scipy.special import expit as sigmoid
# Load both old and new prevalence
base_path = Path('/Users/sarahurbut/Library/CloudStorage/Dropbox-Personal/data_for_running/')
prevalence_t_new = torch.load(base_path / 'prevalence_t_corrected.pt', weights_only=False)
essentials = torch.load(base_path / 'model_essentials.pt', weights_only=False)
prevalence_t_old = essentials['prevalence_t']
# Load both old and new master checkpoints
master_checkpoint_old = torch.load(base_path / 'master_for_fitting_pooled_all_data.pt', weights_only=False, map_location='cpu')
master_checkpoint_new = torch.load(base_path / 'master_for_fitting_pooled_correctedE.pt', weights_only=False, map_location='cpu')
# Get phi from both models
phi_old = master_checkpoint_old['model_state_dict']['phi']
phi_new = master_checkpoint_new['model_state_dict']['phi']
if torch.is_tensor(phi_old):
phi_old = phi_old.detach().cpu().numpy()
if torch.is_tensor(phi_new):
phi_new = phi_new.detach().cpu().numpy()
# Get clusters
if 'clusters' in master_checkpoint_old:
clusters_old = master_checkpoint_old['clusters']
else:
ref_checkpoint = torch.load('/Users/sarahurbut/Library/CloudStorage/Dropbox-Personal/model_with_kappa_bigam.pt',
map_location='cpu', weights_only=False)
clusters_old = ref_checkpoint['clusters']
if 'clusters' in master_checkpoint_new:
clusters_new = master_checkpoint_new['clusters']
else:
ref_checkpoint = torch.load('/Users/sarahurbut/Library/CloudStorage/Dropbox-Personal/model_with_kappa_bigam.pt',
map_location='cpu', weights_only=False)
clusters_new = ref_checkpoint['clusters']
if torch.is_tensor(clusters_old):
clusters_old = clusters_old.detach().cpu().numpy()
if torch.is_tensor(clusters_new):
clusters_new = clusters_new.detach().cpu().numpy()
# Convert to numpy if needed
if torch.is_tensor(prevalence_t_new):
prevalence_t_new = prevalence_t_new.numpy()
if torch.is_tensor(prevalence_t_old):
prevalence_t_old = prevalence_t_old.numpy()
# Load disease names
disease_names = essentials['disease_names']
# Focus on key diseases that show the pattern
# Disease 47 = Type 2 diabetes, Disease 112 = Myocardial infarction
disease_indices = [47, 112]
disease_labels = ['Type 2 diabetes', 'Myocardial infarction']
# Age points
T = prevalence_t_new.shape[1]
ages = np.arange(30, 30 + T)
# Create comparison plots
fig, axes = plt.subplots(2, 2, figsize=(16, 12))
for idx, (disease_idx, disease_label) in enumerate(zip(disease_indices, disease_labels)):
disease_name = disease_names[disease_idx] if disease_idx < len(disease_names) else f"Disease {disease_idx}"
# Get assigned signatures
assigned_sig_old = int(clusters_old[disease_idx])
assigned_sig_new = int(clusters_new[disease_idx])
# Get prevalence and phi
prev_old = prevalence_t_old[disease_idx, :]
prev_new = prevalence_t_new[disease_idx, :]
phi_old_assigned = phi_old[assigned_sig_old, disease_idx, :]
phi_new_assigned = phi_new[assigned_sig_new, disease_idx, :]
# Convert phi to probability
phi_old_prob = sigmoid(phi_old_assigned)
phi_new_prob = sigmoid(phi_new_assigned)
# Handle NaN values
prev_old = np.where(np.isnan(prev_old), np.nan, prev_old)
prev_new = np.where(np.isnan(prev_new), np.nan, prev_new)
phi_old_prob = np.where(np.isnan(phi_old_assigned), np.nan, phi_old_prob)
phi_new_prob = np.where(np.isnan(phi_new_assigned), np.nan, phi_new_prob)
# Top row: Prevalence comparison
ax = axes[0, idx]
ax.plot(ages, prev_old, 'b-', linewidth=3, label='Prevalence - OLD (enrollment E)', alpha=0.8, zorder=10)
ax.plot(ages, prev_new, 'r--', linewidth=3, label='Prevalence - NEW (corrected E)', alpha=0.8, zorder=10)
ax.set_ylabel('Probability', fontsize=12, fontweight='bold')
ax.set_xlabel('Age (years)', fontsize=12, fontweight='bold')
ax.set_title(f'{disease_label} (Index {disease_idx})\nPrevalence: OLD vs NEW',
fontsize=13, fontweight='bold')
ax.grid(True, alpha=0.3)
ax.legend(fontsize=10, loc='best')
# Bottom row: Phi comparison
ax = axes[1, idx]
ax.plot(ages, phi_old_prob, 'b:', linewidth=3, label=f'Sigmoid(Phi, sig {assigned_sig_old}) - OLD', alpha=0.8, zorder=9)
ax.plot(ages, phi_new_prob, 'r-.', linewidth=3, label=f'Sigmoid(Phi, sig {assigned_sig_new}) - NEW', alpha=0.8, zorder=9)
ax.set_ylabel('Probability', fontsize=12, fontweight='bold')
ax.set_xlabel('Age (years)', fontsize=12, fontweight='bold')
ax.set_title(f'{disease_label} (Index {disease_idx})\nSigmoid(Phi): OLD vs NEW',
fontsize=13, fontweight='bold')
ax.grid(True, alpha=0.3)
ax.legend(fontsize=10, loc='best')
# Add annotations for old model drop
if len(prev_old[~np.isnan(prev_old)]) > 0:
old_max_idx = np.nanargmax(prev_old)
old_max_age = ages[old_max_idx]
old_max_val = prev_old[old_max_idx]
if old_max_idx < len(prev_old) - 5:
old_end_val = np.nanmean(prev_old[-5:])
if old_end_val < old_max_val * 0.9: # Significant drop
axes[0, idx].annotate(f'OLD drops after age {old_max_age:.0f}',
xy=(old_max_age, old_max_val),
xytext=(old_max_age + 5, old_max_val * 0.7),
arrowprops=dict(arrowstyle='->', color='blue', lw=2),
fontsize=10, color='blue', fontweight='bold',
bbox=dict(boxstyle='round', facecolor='lightblue', alpha=0.7))
plt.tight_layout()
plt.show()
print("\n✓ Old vs New prevalence and phi comparison plots created!")
print(" Blue lines: OLD model (enrollment E) - shows artificial drop after peak")
print(" Red lines: NEW model (corrected E) - shows realistic continuous increase")
✓ Old vs New prevalence and phi comparison plots created! Blue lines: OLD model (enrollment E) - shows artificial drop after peak Red lines: NEW model (corrected E) - shows realistic continuous increase
import torch
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Load old and new prevalence
base_path = Path('/Users/sarahurbut/Library/CloudStorage/Dropbox-Personal/data_for_running/')
prevalence_t_new = torch.load(base_path / 'prevalence_t_corrected.pt', weights_only=False)
essentials = torch.load(base_path / 'model_essentials.pt', weights_only=False)
prevalence_t_old = essentials['prevalence_t']
# Convert to numpy if needed
if torch.is_tensor(prevalence_t_new):
prevalence_t_new = prevalence_t_new.numpy()
if torch.is_tensor(prevalence_t_old):
prevalence_t_old = prevalence_t_old.numpy()
print(f"New prevalence shape: {prevalence_t_new.shape}")
print(f"Old prevalence shape: {prevalence_t_old.shape}")
# Diseases to plot
disease_indices = [47, 10, 11, 112, 127,17]
disease_names = essentials['disease_names']
# Age points (assuming age 30 = timepoint 0)
T = prevalence_t_new.shape[1]
ages = np.arange(30, 30 + T)
# Create subplots
fig, axes = plt.subplots(2, 3, figsize=(18, 10))
axes = axes.flatten()
for idx, d_idx in enumerate(disease_indices):
ax = axes[idx]
# Get disease name
if d_idx < len(disease_names):
disease_name = disease_names[d_idx]
else:
disease_name = f"Disease {d_idx}"
# Get prevalence values
prev_new = prevalence_t_new[d_idx, :]
prev_old = prevalence_t_old[d_idx, :]
# Plot
ax.plot(ages, prev_old, 'b-', linewidth=2, label='Old (enrollment E)', alpha=0.7)
ax.plot(ages, prev_new, 'r--', linewidth=2, label='New (corrected E)', alpha=0.7)
ax.set_xlabel('Age (years)', fontsize=12, fontweight='bold')
ax.set_ylabel('Prevalence', fontsize=12, fontweight='bold')
ax.set_title(f'{disease_name} (Index {d_idx})', fontsize=13, fontweight='bold')
ax.legend(fontsize=10)
ax.grid(True, alpha=0.3)
# Add text showing if there are differences
max_diff = np.nanmax(np.abs(prev_new - prev_old))
if max_diff > 0.001:
ax.text(0.05, 0.95, f'Max diff: {max_diff:.4f}',
transform=ax.transAxes, fontsize=9,
verticalalignment='top', bbox=dict(boxstyle='round', facecolor='yellow', alpha=0.5))
plt.tight_layout()
plt.show()
print("\n✓ Prevalence comparison plots created!")
print(" Blue line: Old prevalence (enrollment E)")
print(" Red dashed: New prevalence (corrected E)")
New prevalence shape: (348, 52) Old prevalence shape: (348, 52)
✓ Prevalence comparison plots created! Blue line: Old prevalence (enrollment E) Red dashed: New prevalence (corrected E)
# Direct comparison: Old vs New logit(prevalence) for disease 47
# This clearly shows the difference - old drops, new continues to increase
import torch
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path
# Load both old and new prevalence
base_path = Path('/Users/sarahurbut/Library/CloudStorage/Dropbox-Personal/data_for_running/')
prevalence_t_new = torch.load(base_path / 'prevalence_t_corrected.pt', weights_only=False)
essentials = torch.load(base_path / 'model_essentials.pt', weights_only=False)
prevalence_t_old = essentials['prevalence_t']
# Load both old and new master checkpoints
master_checkpoint_old = torch.load(base_path / 'master_for_fitting_pooled_all_data.pt', weights_only=False, map_location='cpu')
master_checkpoint_new = torch.load(base_path / 'master_for_fitting_pooled_correctedE.pt', weights_only=False, map_location='cpu')
# Get phi from both models
phi_old = master_checkpoint_old['model_state_dict']['phi']
phi_new = master_checkpoint_new['model_state_dict']['phi']
if torch.is_tensor(phi_old):
phi_old = phi_old.detach().cpu().numpy()
if torch.is_tensor(phi_new):
phi_new = phi_new.detach().cpu().numpy()
# Get clusters (disease-to-signature mapping)
if 'clusters' in master_checkpoint_old:
clusters_old = master_checkpoint_old['clusters']
else:
ref_checkpoint = torch.load('/Users/sarahurbut/Library/CloudStorage/Dropbox-Personal/model_with_kappa_bigam.pt',
map_location='cpu', weights_only=False)
clusters_old = ref_checkpoint['clusters']
if 'clusters' in master_checkpoint_new:
clusters_new = master_checkpoint_new['clusters']
else:
ref_checkpoint = torch.load('/Users/sarahurbut/Library/CloudStorage/Dropbox-Personal/model_with_kappa_bigam.pt',
map_location='cpu', weights_only=False)
clusters_new = ref_checkpoint['clusters']
if torch.is_tensor(clusters_old):
clusters_old = clusters_old.detach().cpu().numpy()
if torch.is_tensor(clusters_new):
clusters_new = clusters_new.detach().cpu().numpy()
# Convert to numpy if needed
if torch.is_tensor(prevalence_t_new):
prevalence_t_new = prevalence_t_new.numpy()
if torch.is_tensor(prevalence_t_old):
prevalence_t_old = prevalence_t_old.numpy()
# Load disease names
disease_names = essentials['disease_names']
# Focus on disease 47 (Type 2 diabetes) - or change to 112 for MI
disease_idx = 112
disease_name = disease_names[disease_idx] if disease_idx < len(disease_names) else f"Disease {disease_idx}"
# Get assigned signatures
assigned_sig_old = int(clusters_old[disease_idx])
assigned_sig_new = int(clusters_new[disease_idx])
# Age points
T = prevalence_t_new.shape[1]
ages = np.arange(30, 30 + T)
# Get prevalence for both old and new
prev_old = prevalence_t_old[disease_idx, :]
prev_new = prevalence_t_new[disease_idx, :]
# Get phi for both old and new (in assigned signatures)
phi_old_assigned = phi_old[assigned_sig_old, disease_idx, :]
phi_new_assigned = phi_new[assigned_sig_new, disease_idx, :]
# Convert phi to probability scale using sigmoid
from scipy.special import expit as sigmoid
phi_old_prob = sigmoid(phi_old_assigned)
phi_new_prob = sigmoid(phi_new_assigned)
# Handle NaN values
prev_old = np.where(np.isnan(prev_old), np.nan, prev_old)
prev_new = np.where(np.isnan(prev_new), np.nan, prev_new)
phi_old_prob = np.where(np.isnan(phi_old_assigned), np.nan, phi_old_prob)
phi_new_prob = np.where(np.isnan(phi_new_assigned), np.nan, phi_new_prob)
# Create comparison plot
fig, ax = plt.subplots(1, 1, figsize=(14, 8))
# Plot both prevalence curves (probability scale)
line1 = ax.plot(ages, prev_old, 'b-', linewidth=3, label='Prevalence - OLD (enrollment E)', alpha=0.8, zorder=10)
line2 = ax.plot(ages, prev_new, 'r--', linewidth=3, label='Prevalence - NEW (corrected E)', alpha=0.8, zorder=10)
# Plot both phi curves converted to probability (in assigned signatures)
line3 = ax.plot(ages, phi_old_prob, 'b:', linewidth=3, label=f'Sigmoid(Phi, sig {assigned_sig_old}) - OLD', alpha=0.8, zorder=9)
line4 = ax.plot(ages, phi_new_prob, 'r-.', linewidth=3, label=f'Sigmoid(Phi, sig {assigned_sig_new}) - NEW', alpha=0.8, zorder=9)
ax.set_ylabel('Probability', fontsize=12, fontweight='bold')
ax.set_xlabel('Age (years)', fontsize=14, fontweight='bold')
ax.set_title(f'{disease_name} (Index {disease_idx})\nPrevalence & Sigmoid(Phi): OLD vs NEW',
fontsize=15, fontweight='bold')
ax.grid(True, alpha=0.3)
# Update legend to include all 4 lines
lines = line1 + line2 + line3 + line4
ax.legend(lines, [l.get_label() for l in lines], fontsize=11, loc='best')
# Add annotation highlighting the key difference
# Find where old starts to drop (if it does)
if len(prev_old[~np.isnan(prev_old)]) > 0:
old_max_idx = np.nanargmax(prev_old)
old_max_age = ages[old_max_idx]
old_max_val = prev_old[old_max_idx]
# Check if old drops after peak
if old_max_idx < len(prev_old) - 5:
old_end_val = np.nanmean(prev_old[-5:])
if old_end_val < old_max_val * 0.9: # Significant drop (10% or more)
ax.annotate(f'OLD drops after age {old_max_age:.0f}',
xy=(old_max_age, old_max_val),
xytext=(old_max_age + 5, old_max_val * 0.7),
arrowprops=dict(arrowstyle='->', color='blue', lw=2),
fontsize=11, color='blue', fontweight='bold',
bbox=dict(boxstyle='round', facecolor='lightblue', alpha=0.7))
# Show that new continues to increase
new_end_val = np.nanmean(prev_new[-5:])
new_start_val = np.nanmean(prev_new[:5])
if new_end_val > new_start_val * 1.1: # Significant increase (10% or more)
ax.annotate('NEW continues to increase',
xy=(ages[-5], new_end_val),
xytext=(ages[-10], new_end_val * 1.2),
arrowprops=dict(arrowstyle='->', color='red', lw=2),
fontsize=11, color='red', fontweight='bold',
bbox=dict(boxstyle='round', facecolor='lightcoral', alpha=0.7))
plt.tight_layout()
plt.show()
print("\n✓ Direct comparison of OLD vs NEW prevalence AND sigmoid(phi) created!")
print(f" Blue solid: OLD prevalence - shows censoring bias (drops after peak)")
print(f" Red dashed: NEW prevalence - corrected for censoring (continues to increase)")
print(f" Blue dotted: OLD sigmoid(phi, sig {assigned_sig_old}) - tracks old prevalence")
print(f" Red dash-dot: NEW sigmoid(phi, sig {assigned_sig_new}) - tracks new prevalence")
print(f"\nThis demonstrates that corrected E removes the artificial U-shaped pattern!")
print(f"Both sigmoid(phi) curves closely track their respective prevalence curves (high correlation).")
✓ Direct comparison of OLD vs NEW prevalence AND sigmoid(phi) created! Blue solid: OLD prevalence - shows censoring bias (drops after peak) Red dashed: NEW prevalence - corrected for censoring (continues to increase) Blue dotted: OLD sigmoid(phi, sig 5) - tracks old prevalence Red dash-dot: NEW sigmoid(phi, sig 5) - tracks new prevalence This demonstrates that corrected E removes the artificial U-shaped pattern! Both sigmoid(phi) curves closely track their respective prevalence curves (high correlation).
# Compare phi (from corrected E model) with corrected prevalence for important signatures
# For disease 47, show what phi would be if it were in other signatures
import torch
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path
from scipy.special import expit as sigmoid
# Load master checkpoint (corrected E)
base_path = Path('/Users/sarahurbut/Library/CloudStorage/Dropbox-Personal/data_for_running/')
master_checkpoint = torch.load(base_path / 'master_for_fitting_pooled_correctedE.pt', weights_only=False, map_location='cpu')
# Get phi and clusters
phi = master_checkpoint['model_state_dict']['phi'] # Shape: (K, D, T)
if torch.is_tensor(phi):
phi = phi.detach().cpu().numpy()
# Load clusters (disease-to-signature mapping)
# Try to get from checkpoint or from reference file
if 'clusters' in master_checkpoint:
clusters = master_checkpoint['clusters']
else:
# Load from reference checkpoint
ref_checkpoint = torch.load('/Users/sarahurbut/Dropbox-Personal/model_with_kappa_bigam.pt',
map_location='cpu', weights_only=False)
clusters = ref_checkpoint['clusters']
if torch.is_tensor(clusters):
clusters = clusters.detach().cpu().numpy()
# Load corrected prevalence
prevalence_t_new = torch.load(base_path / 'prevalence_t_corrected.pt', weights_only=False)
if torch.is_tensor(prevalence_t_new):
prevalence_t_new = prevalence_t_new.numpy()
# Load disease names
essentials = torch.load(base_path / 'model_essentials.pt', weights_only=False)
disease_names = essentials['disease_names']
print(f"Phi shape: {phi.shape} (K={phi.shape[0]} signatures, D={phi.shape[1]} diseases, T={phi.shape[2]} timepoints)")
print(f"Clusters shape: {clusters.shape}")
print(f"Prevalence shape: {prevalence_t_new.shape}")
# Focus on disease 47 (Type 2 diabetes) - show what phi would be in different signatures
disease_idx = 112
disease_name = disease_names[disease_idx] if disease_idx < len(disease_names) else f"Disease {disease_idx}"
assigned_sig = int(clusters[disease_idx])
print(f"\nDisease {disease_idx} ({disease_name}) is assigned to signature {assigned_sig}")
# Age points
T = prevalence_t_new.shape[1]
ages = np.arange(30, 30 + T)
# Get prevalence for disease 47
prev_disease = prevalence_t_new[disease_idx, :]
# Convert prevalence to logit scale for comparison with phi
epsilon = 1e-8
prev_disease_clipped = np.clip(prev_disease, epsilon, 1 - epsilon)
logit_prev = np.log(prev_disease_clipped / (1 - prev_disease_clipped))
# Handle NaN values
logit_prev = np.where(np.isnan(prev_disease), np.nan, logit_prev)
# Create plot showing disease 47's phi in different signatures vs logit(prevalence)
fig, ax = plt.subplots(1, 1, figsize=(14, 8))
# Plot logit(prevalence) - this is what phi should match (up to psi offset)
line1 = ax.plot(ages, logit_prev, 'r-', linewidth=3, label='Logit(Prevalence) (corrected E)', alpha=0.9, zorder=10)
# Plot phi for disease 47 in its assigned signature (highlighted)
phi_assigned = phi[assigned_sig, disease_idx, :]
line2 = ax.plot(ages, phi_assigned, 'b-', linewidth=3, label=f'Phi in assigned sig {assigned_sig}', alpha=0.9, zorder=9)
# Plot phi for disease 47 in other signatures (lighter, for comparison)
K = phi.shape[0]
colors = plt.cm.tab20(np.linspace(0, 1, K))
for sig_idx in range(K):
if sig_idx != assigned_sig:
phi_other = phi[sig_idx, disease_idx, :]
ax.plot(ages, phi_other, '--', linewidth=1.5, color=colors[sig_idx],
alpha=0.4, label=f'Sig {sig_idx}', zorder=1)
ax.set_ylabel('Logit scale', fontsize=12, fontweight='bold')
ax.set_xlabel('Age (years)', fontsize=14, fontweight='bold')
ax.set_title(f'{disease_name} (Index {disease_idx})\nPhi vs Logit(Prevalence) across different signatures',
fontsize=15, fontweight='bold')
ax.grid(True, alpha=0.3)
# Combined legend
lines = line1 + line2
labels = [l.get_label() for l in lines]
ax.legend(lines, labels, loc='best', fontsize=11)
ax.text(0.02, 0.98, f'Other signatures shown in lighter dashed lines\n(assigned sig {assigned_sig} in bold blue)',
transform=ax.transAxes, fontsize=9, verticalalignment='top',
bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.7))
# Calculate correlation and mean difference for assigned signature
valid_mask = ~np.isnan(prev_disease) & ~np.isnan(logit_prev) & ~np.isnan(phi_assigned)
if valid_mask.sum() > 5:
corr = np.corrcoef(phi_assigned[valid_mask], logit_prev[valid_mask])[0, 1]
mean_diff = np.nanmean(phi_assigned[valid_mask] - logit_prev[valid_mask])
ax.text(0.98, 0.02, f'Corr (sig {assigned_sig}): {corr:.3f}\nMean diff: {mean_diff:.2f}',
transform=ax.transAxes, fontsize=11, ha='right',
bbox=dict(boxstyle='round', facecolor='lightblue', alpha=0.8))
plt.tight_layout()
plt.show()
print("\n✓ Phi vs Logit(Prevalence) comparison created!")
print(f" Red line: Logit(Prevalence) for {disease_name}")
print(f" Bold blue line: Phi in assigned signature {assigned_sig}")
print(f" Light dashed lines: Phi in other signatures (for comparison)")
print(f"\nNote: Phi should match logit(prevalence) up to a constant offset (psi)")
print(f"High correlation indicates phi learned patterns consistent with corrected prevalence")
Phi shape: (21, 348, 52) (K=21 signatures, D=348 diseases, T=52 timepoints) Clusters shape: (348,) Prevalence shape: (348, 52) Disease 112 (Myocardial infarction) is assigned to signature 5
✓ Phi vs Logit(Prevalence) comparison created! Red line: Logit(Prevalence) for Myocardial infarction Bold blue line: Phi in assigned signature 5 Light dashed lines: Phi in other signatures (for comparison) Note: Phi should match logit(prevalence) up to a constant offset (psi) High correlation indicates phi learned patterns consistent with corrected prevalence
# Compare phi (from OLD enrollment E model) with OLD prevalence for disease 47
# This shows the same relationship for the old model
import torch
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path
from scipy.special import expit as sigmoid
# Load OLD master checkpoint (enrollment E)
base_path = Path('/Users/sarahurbut/Library/CloudStorage/Dropbox-Personal/data_for_running/')
master_checkpoint_old = torch.load(base_path / 'master_for_fitting_pooled_all_data.pt', weights_only=False, map_location='cpu')
# Get old phi and clusters
phi_old = master_checkpoint_old['model_state_dict']['phi'] # Shape: (K, D, T)
if torch.is_tensor(phi_old):
phi_old = phi_old.detach().cpu().numpy()
# Load clusters (disease-to-signature mapping) - should be same as new model
if 'clusters' in master_checkpoint_old:
clusters_old = master_checkpoint_old['clusters']
else:
# Load from reference checkpoint
ref_checkpoint = torch.load('/Users/sarahurbut/Library/CloudStorage/Dropbox-Personal/model_with_kappa_bigam.pt',
map_location='cpu', weights_only=False)
clusters_old = ref_checkpoint['clusters']
if torch.is_tensor(clusters_old):
clusters_old = clusters_old.detach().cpu().numpy()
# Load OLD prevalence from essentials
essentials = torch.load(base_path / 'model_essentials.pt', weights_only=False)
prevalence_t_old = essentials['prevalence_t']
if torch.is_tensor(prevalence_t_old):
prevalence_t_old = prevalence_t_old.numpy()
# Load disease names
disease_names = essentials['disease_names']
print(f"Old phi shape: {phi_old.shape} (K={phi_old.shape[0]} signatures, D={phi_old.shape[1]} diseases, T={phi_old.shape[2]} timepoints)")
print(f"Old clusters shape: {clusters_old.shape}")
print(f"Old prevalence shape: {prevalence_t_old.shape}")
# Focus on disease 47 (Type 2 diabetes) - show phi vs logit(prevalence) for OLD model
disease_idx = 112
disease_name = disease_names[disease_idx] if disease_idx < len(disease_names) else f"Disease {disease_idx}"
assigned_sig_old = int(clusters_old[disease_idx])
print(f"\nDisease {disease_idx} ({disease_name}) is assigned to signature {assigned_sig_old} in OLD model")
# Age points
T = prevalence_t_old.shape[1]
ages = np.arange(30, 30 + T)
# Get OLD prevalence for disease 47
prev_disease_old = prevalence_t_old[disease_idx, :]
# Convert OLD prevalence to logit scale for comparison with phi
epsilon = 1e-8
prev_disease_old_clipped = np.clip(prev_disease_old, epsilon, 1 - epsilon)
logit_prev_old = np.log(prev_disease_old_clipped / (1 - prev_disease_old_clipped))
# Handle NaN values
logit_prev_old = np.where(np.isnan(prev_disease_old), np.nan, logit_prev_old)
# Create plot showing disease 47's phi in different signatures vs logit(prevalence) for OLD model
fig, ax = plt.subplots(1, 1, figsize=(14, 8))
# Plot logit(prevalence) - this is what phi should match (up to psi offset)
line1 = ax.plot(ages, logit_prev_old, 'r-', linewidth=3, label='Logit(Prevalence) (OLD enrollment E)', alpha=0.9, zorder=10)
# Plot phi for disease 47 in its assigned signature (highlighted)
phi_assigned_old = phi_old[assigned_sig_old, disease_idx, :]
line2 = ax.plot(ages, phi_assigned_old, 'b-', linewidth=3, label=f'Phi in assigned sig {assigned_sig_old} (OLD)', alpha=0.9, zorder=9)
# Plot phi for disease 47 in other signatures (lighter, for comparison)
K = phi_old.shape[0]
colors = plt.cm.tab20(np.linspace(0, 1, K))
for sig_idx in range(K):
if sig_idx != assigned_sig_old:
phi_other_old = phi_old[sig_idx, disease_idx, :]
ax.plot(ages, phi_other_old, '--', linewidth=1.5, color=colors[sig_idx],
alpha=0.4, label=f'Sig {sig_idx}', zorder=1)
ax.set_ylabel('Logit scale', fontsize=12, fontweight='bold')
ax.set_xlabel('Age (years)', fontsize=14, fontweight='bold')
ax.set_title(f'{disease_name} (Index {disease_idx}) - OLD MODEL\nPhi vs Logit(Prevalence) across different signatures',
fontsize=15, fontweight='bold')
ax.grid(True, alpha=0.3)
# Combined legend
lines = line1 + line2
labels = [l.get_label() for l in lines]
ax.legend(lines, labels, loc='best', fontsize=11)
ax.text(0.02, 0.98, f'Other signatures shown in lighter dashed lines\n(assigned sig {assigned_sig_old} in bold blue)\nOLD MODEL (enrollment E)',
transform=ax.transAxes, fontsize=9, verticalalignment='top',
bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.7))
# Calculate correlation and mean difference for assigned signature
valid_mask = ~np.isnan(prev_disease_old) & ~np.isnan(logit_prev_old) & ~np.isnan(phi_assigned_old)
if valid_mask.sum() > 5:
corr_old = np.corrcoef(phi_assigned_old[valid_mask], logit_prev_old[valid_mask])[0, 1]
mean_diff_old = np.nanmean(phi_assigned_old[valid_mask] - logit_prev_old[valid_mask])
ax.text(0.98, 0.02, f'Corr (sig {assigned_sig_old}): {corr_old:.3f}\nMean diff: {mean_diff_old:.2f}',
transform=ax.transAxes, fontsize=11, ha='right',
bbox=dict(boxstyle='round', facecolor='lightblue', alpha=0.8))
plt.tight_layout()
plt.show()
print("\n✓ Phi vs Logit(Prevalence) comparison for OLD MODEL created!")
print(f" Red line: Logit(Prevalence) for {disease_name} (OLD enrollment E)")
print(f" Bold blue line: Phi in assigned signature {assigned_sig_old} (OLD)")
print(f" Light dashed lines: Phi in other signatures (for comparison)")
print(f"\nNote: Phi should match logit(prevalence) up to a constant offset (psi)")
print(f"High correlation indicates phi learned patterns consistent with OLD prevalence")
Old phi shape: (21, 348, 52) (K=21 signatures, D=348 diseases, T=52 timepoints) Old clusters shape: (348,) Old prevalence shape: (348, 52) Disease 112 (Myocardial infarction) is assigned to signature 5 in OLD model
✓ Phi vs Logit(Prevalence) comparison for OLD MODEL created! Red line: Logit(Prevalence) for Myocardial infarction (OLD enrollment E) Bold blue line: Phi in assigned signature 5 (OLD) Light dashed lines: Phi in other signatures (for comparison) Note: Phi should match logit(prevalence) up to a constant offset (psi) High correlation indicates phi learned patterns consistent with OLD prevalence
# Reload the module to get the updated function
# Load all three trained models
import sys
sys.path.append('/Users/sarahurbut/aladynoulli2/pyScripts')
import torch
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
print("Loading model checkpoints...")
# UKB: master checkpoint with pooled phi
ukb_master = torch.load('/Users/sarahurbut/Library/CloudStorage/Dropbox-Personal/data_for_running/master_for_fitting_pooled_correctedE.pt', map_location='cpu')
ukb_phi = ukb_master['model_state_dict']['phi']
if torch.is_tensor(ukb_phi):
ukb_phi = ukb_phi.detach().cpu().numpy()
# AOU: trained model
aou_checkpoint = torch.load('/Users/sarahurbut/aladynoulli2/aou_model_initialized.pt', map_location='cpu')
aou_checkpoint=torch.load('/Users/sarahurbut/Library/CloudStorage/Dropbox/aou_batches/aou_model_batch_2_20000_30000.pt',map_location='cpu')
aou_phi = aou_checkpoint['model_state_dict']['phi']
aou_phi=torch.load('/Users/sarahurbut/Library/CloudStorage/Dropbox/aou_batches/pooled_phi.pt',map_location='cpu')
# MGB: trained model
mgb_checkpoint = torch.load('/Users/sarahurbut/aladynoulli2/mgb_model_initialized.pt', map_location='cpu')
mgb_phi = mgb_checkpoint['model_state_dict']['phi']
if torch.is_tensor(mgb_phi):
mgb_phi = mgb_phi.detach().cpu().numpy()
# Load clusters and disease names for each cohort
# UKB
ukb_checkpoint_ref = torch.load('/Users/sarahurbut/Dropbox-Personal/model_with_kappa_bigam.pt', map_location='cpu')
ukb_clusters = ukb_checkpoint_ref['clusters']
if torch.is_tensor(ukb_clusters):
ukb_clusters = ukb_clusters.detach().cpu().numpy()
ukb_disease_names = ukb_checkpoint_ref['disease_names']
if isinstance(ukb_disease_names, (list, tuple)):
ukb_disease_names = list(ukb_disease_names)
elif hasattr(ukb_disease_names, 'values'):
ukb_disease_names = ukb_disease_names.values.tolist()
# AOU
aou_clusters = aou_checkpoint['clusters']
if torch.is_tensor(aou_clusters):
aou_clusters = aou_clusters.detach().cpu().numpy()
aou_disease_names = aou_checkpoint.get('disease_names', None)
if aou_disease_names is None:
# Try to load from old checkpoint
aou_old = torch.load('/Users/sarahurbut/Library/CloudStorage/Dropbox-Personal/model_with_kappa_bigam_AOU.pt', map_location='cpu')
aou_disease_names = aou_old.get('disease_names', [f"Disease_{i}" for i in range(aou_phi.shape[1])])
if isinstance(aou_disease_names, (list, tuple)):
aou_disease_names = list(aou_disease_names)
elif hasattr(aou_disease_names, 'values'):
aou_disease_names = aou_disease_names.values.tolist()
# MGB
mgb_clusters = mgb_checkpoint['clusters']
if torch.is_tensor(mgb_clusters):
mgb_clusters = mgb_clusters.detach().cpu().numpy()
mgb_disease_names = mgb_checkpoint.get('disease_names', None)
if mgb_disease_names is None:
# Try to load from old checkpoint
mgb_old = torch.load('/Users/sarahurbut/Dropbox-Personal/model_with_kappa_bigam_MGB.pt', map_location='cpu')
mgb_disease_names = mgb_old.get('disease_names', [f"Disease_{i}" for i in range(mgb_phi.shape[1])])
if isinstance(mgb_disease_names, (list, tuple)):
mgb_disease_names = list(mgb_disease_names)
elif hasattr(mgb_disease_names, 'values'):
mgb_disease_names = mgb_disease_names.values.tolist()
print(f"UKB: phi shape {ukb_phi.shape}, clusters shape {ukb_clusters.shape}, {len(ukb_disease_names)} diseases")
print(f"AOU: phi shape {aou_phi.shape}, clusters shape {aou_clusters.shape}, {len(aou_disease_names)} diseases")
print(f"MGB: phi shape {mgb_phi.shape}, clusters shape {mgb_clusters.shape}, {len(mgb_disease_names)} diseases")
# Determine number of signatures (should be 15 for all)
n_sigs = min(ukb_phi.shape[0], aou_phi.shape[0], mgb_phi.shape[0])
selected_sigs = list(range(n_sigs))
print(f"\nPlotting {n_sigs} signatures for each cohort")
import importlib
import sys
sys.path.append('/Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/main_paper_figures')
from plot_signature_patterns_fixed import *
# Plot UKB signatures
print("\n" + "="*80)
print("UKB Signature Patterns")
print("="*80)
fig_ukb = plot_signature_phi_patterns(
ukb_phi, ukb_clusters, ukb_disease_names,
selected_signatures=selected_sigs,
top_n_diseases=None, # Show all diseases in each signature
plot_probability=True, # Convert to probability
age_offset=30
)
plt.suptitle('UK Biobank', fontsize=16, fontweight='bold', y=0.995)
plt.tight_layout(rect=[0, 0, 0.85, 0.98])
plt.show()
# Plot AOU signatures
print("\n" + "="*80)
print("AOU Signature Patterns")
print("="*80)
fig_aou = plot_signature_phi_patterns(
aou_phi, aou_clusters, aou_disease_names,
selected_signatures=selected_sigs,
top_n_diseases=None, # Show all diseases in each signature
plot_probability=True, # Convert to probability
age_offset=30
)
plt.suptitle('All of Us', fontsize=16, fontweight='bold', y=0.995)
plt.tight_layout(rect=[0, 0, 0.85, 0.98])
plt.show()
# Plot MGB signatures
print("\n" + "="*80)
print("MGB Signature Patterns")
print("="*80)
fig_mgb = plot_signature_phi_patterns(
mgb_phi, mgb_clusters, mgb_disease_names,
selected_signatures=selected_sigs,
top_n_diseases=None, # Show all diseases in each signature
plot_probability=True, # Convert to probability
age_offset=30
)
plt.suptitle('MGB', fontsize=16, fontweight='bold', y=0.995)
plt.tight_layout(rect=[0, 0, 0.85, 0.98])
plt.show()
print("\n✓ All signature plots generated!")
Loading model checkpoints... UKB: phi shape (21, 348, 52), clusters shape (348,), 348 diseases AOU: phi shape (21, 348, 52), clusters shape (348,), 348 diseases MGB: phi shape (21, 346, 51), clusters shape (346,), 346 diseases Plotting 21 signatures for each cohort ================================================================================ UKB Signature Patterns ================================================================================
/var/folders/fl/ng5crz0x0fnb6c6x8dk7tfth0000gn/T/ipykernel_73824/1787622968.py:15: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
ukb_master = torch.load('/Users/sarahurbut/Library/CloudStorage/Dropbox-Personal/data_for_running/master_for_fitting_pooled_correctedE.pt', map_location='cpu')
/var/folders/fl/ng5crz0x0fnb6c6x8dk7tfth0000gn/T/ipykernel_73824/1787622968.py:21: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
aou_checkpoint = torch.load('/Users/sarahurbut/aladynoulli2/aou_model_initialized.pt', map_location='cpu')
/var/folders/fl/ng5crz0x0fnb6c6x8dk7tfth0000gn/T/ipykernel_73824/1787622968.py:22: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
aou_checkpoint=torch.load('/Users/sarahurbut/Library/CloudStorage/Dropbox/aou_batches/aou_model_batch_2_20000_30000.pt',map_location='cpu')
/var/folders/fl/ng5crz0x0fnb6c6x8dk7tfth0000gn/T/ipykernel_73824/1787622968.py:24: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
aou_phi=torch.load('/Users/sarahurbut/Library/CloudStorage/Dropbox/aou_batches/pooled_phi.pt',map_location='cpu')
/var/folders/fl/ng5crz0x0fnb6c6x8dk7tfth0000gn/T/ipykernel_73824/1787622968.py:27: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
mgb_checkpoint = torch.load('/Users/sarahurbut/aladynoulli2/mgb_model_initialized.pt', map_location='cpu')
/var/folders/fl/ng5crz0x0fnb6c6x8dk7tfth0000gn/T/ipykernel_73824/1787622968.py:34: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
ukb_checkpoint_ref = torch.load('/Users/sarahurbut/Dropbox-Personal/model_with_kappa_bigam.pt', map_location='cpu')
================================================================================ AOU Signature Patterns ================================================================================
================================================================================ MGB Signature Patterns ================================================================================
✓ All signature plots generated!
4. Conclusion: Censoring Bias, Not Competing Risks¶
Summary:¶
The Problem: The original model showed decreasing hazards at older ages, which the reviewer correctly identified as concerning
The Root Cause: This was censoring bias, not competing risks:
- The E matrix assumed all patients were followed until age 81
- In reality, many patients were censored earlier (see distribution plots above)
- This created artificial decreases in disease risk at older ages
The Solution: We implemented a corrected E matrix that:
- Accounts for realistic censoring times (
max_censor - 30) - Uses at-risk filtering for prevalence calculations
- Results in realistic, continuously increasing disease risk with age
- Accounts for realistic censoring times (
The Evidence:
- Censor distributions show variable follow-up times across cohorts
- Old vs New prevalence clearly shows the artificial U-shaped pattern in the old model
- Old vs New phi demonstrates that learned patterns match corrected prevalence
- Figure 2 signature patterns show the systematic nature of the problem across multiple signatures and cohorts
Key Takeaway:¶
The decreasing hazards at older ages were not due to competing risks, but rather censoring bias in the original model. The corrected E approach addresses this issue, resulting in biologically realistic disease risk patterns that continue to increase with age, as expected.
Competing Risks Are Still Handled:¶
While the decreasing hazards were not due to competing risks, our model does handle competing risks appropriately:
- Multi-disease modeling: Patients remain at risk for all diseases even after developing one
- Death as competing event: Death is modeled as a disease outcome
- Censoring: Emigration and administrative censoring are handled through the E matrix
See R3_Competing_Risks.ipynb for a detailed discussion of how Aladynoulli handles competing risks.
aou_old = torch.load('/Users/sarahurbut/Library/CloudStorage/Dropbox-Personal/model_with_kappa_bigam_AOU.pt', map_location='cpu')
/var/folders/fl/ng5crz0x0fnb6c6x8dk7tfth0000gn/T/ipykernel_73824/3880472051.py:1: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
aou_old = torch.load('/Users/sarahurbut/Library/CloudStorage/Dropbox-Personal/model_with_kappa_bigam_AOU.pt', map_location='cpu')
tensor([[-1.7026, -0.3778],
[-1.0642, 0.2354],
[ 0.3483, -1.1395],
...,
[-1.3551, -0.2762],
[-0.3538, -0.2892],
[-1.6775, -1.2120]])
%run /Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/plot_ukb_sigs.py
Number of signatures: 20 Number of diseases: 348 Found 40 batch files Loading enrollment_model_W0.0001_batch_0_10000.pt... Loading enrollment_model_W0.0001_batch_100000_110000.pt... Loading enrollment_model_W0.0001_batch_10000_20000.pt... Loading enrollment_model_W0.0001_batch_110000_120000.pt... Loading enrollment_model_W0.0001_batch_120000_130000.pt... Loading enrollment_model_W0.0001_batch_130000_140000.pt... Loading enrollment_model_W0.0001_batch_140000_150000.pt... Loading enrollment_model_W0.0001_batch_150000_160000.pt... Loading enrollment_model_W0.0001_batch_160000_170000.pt... Loading enrollment_model_W0.0001_batch_170000_180000.pt... Loading enrollment_model_W0.0001_batch_180000_190000.pt... Loading enrollment_model_W0.0001_batch_190000_200000.pt... Loading enrollment_model_W0.0001_batch_200000_210000.pt... Loading enrollment_model_W0.0001_batch_20000_30000.pt... Loading enrollment_model_W0.0001_batch_210000_220000.pt... Loading enrollment_model_W0.0001_batch_220000_230000.pt... Loading enrollment_model_W0.0001_batch_230000_240000.pt... Loading enrollment_model_W0.0001_batch_240000_250000.pt... Loading enrollment_model_W0.0001_batch_250000_260000.pt... Loading enrollment_model_W0.0001_batch_260000_270000.pt... Loading enrollment_model_W0.0001_batch_270000_280000.pt... Loading enrollment_model_W0.0001_batch_280000_290000.pt... Loading enrollment_model_W0.0001_batch_290000_300000.pt... Loading enrollment_model_W0.0001_batch_300000_310000.pt... Loading enrollment_model_W0.0001_batch_30000_40000.pt... Loading enrollment_model_W0.0001_batch_310000_320000.pt... Loading enrollment_model_W0.0001_batch_320000_330000.pt... Loading enrollment_model_W0.0001_batch_330000_340000.pt... Loading enrollment_model_W0.0001_batch_340000_350000.pt... Loading enrollment_model_W0.0001_batch_350000_360000.pt... Loading enrollment_model_W0.0001_batch_360000_370000.pt... Loading enrollment_model_W0.0001_batch_370000_380000.pt... Loading enrollment_model_W0.0001_batch_380000_390000.pt... Loading enrollment_model_W0.0001_batch_390000_400000.pt... Loading enrollment_model_W0.0001_batch_40000_50000.pt... Loading enrollment_model_W0.0001_batch_50000_60000.pt... Loading enrollment_model_W0.0001_batch_60000_70000.pt... Loading enrollment_model_W0.0001_batch_70000_80000.pt... Loading enrollment_model_W0.0001_batch_80000_90000.pt... Loading enrollment_model_W0.0001_batch_90000_100000.pt... Stacked phi shape: (40, 21, 348, 52) Number of batches: 40 ✓ Saved plot to: /Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/results/paper_figs/supp/ukb_all_signatures_probabilities.pdf
============================================================
SIGNATURE SUMMARY
============================================================
Signature 0: 16 diseases
Top diseases:
Heart failure NOS: max prob = 0.032763
Pleurisy; pleural effusion: max prob = 0.032069
Atrial fibrillation and flutter: max prob = 0.029935
Congestive heart failure (CHF) NOS: max prob = 0.019836
Cardiomegaly: max prob = 0.018406
Signature 1: 21 diseases
Top diseases:
Arthropathy NOS: max prob = 0.121699
Osteoarthritis; localized: max prob = 0.022993
Pain in joint: max prob = 0.017354
Osteoarthrosis, generalized: max prob = 0.011762
Enthesopathy: max prob = 0.010728
Signature 2: 15 diseases
Top diseases:
Diaphragmatic hernia: max prob = 0.047559
Iron deficiency anemias, unspecified or not due to: max prob = 0.035158
Gastritis and duodenitis: max prob = 0.022132
Other specified gastritis: max prob = 0.018481
Benign neoplasm of other parts of digestive system: max prob = 0.014036
Signature 3: 82 diseases
Top diseases:
Hearing loss: max prob = 0.016847
Inguinal hernia: max prob = 0.010563
Viral infection: max prob = 0.008714
Gout: max prob = 0.008670
Right bundle branch block: max prob = 0.007062
Signature 4: 5 diseases
Top diseases:
Other upper respiratory disease: max prob = 0.001889
Septal Deviations/Turbinate Hypertrophy: max prob = 0.001619
Chronic sinusitis: max prob = 0.001136
Nasal polyps: max prob = 0.001043
Chronic pharyngitis and nasopharyngitis: max prob = 0.000598
Signature 5: 7 diseases
Top diseases:
Hypercholesterolemia: max prob = 0.080995
Other chronic ischemic heart disease, unspecified: max prob = 0.038841
Coronary atherosclerosis: max prob = 0.034803
Myocardial infarction: max prob = 0.028510
Angina pectoris: max prob = 0.023412
Signature 6: 8 diseases
Top diseases:
Secondary malignancy of lymph nodes: max prob = 0.014521
Secondary malignant neoplasm of liver: max prob = 0.011117
Malignant neoplasm, other: max prob = 0.006877
Secondary malignancy of bone: max prob = 0.006727
Cancer of bronchus; lung: max prob = 0.006718
Signature 7: 22 diseases
Top diseases:
Essential hypertension: max prob = 0.095155
GERD: max prob = 0.019439
Osteoporosis NOS: max prob = 0.017480
Obesity: max prob = 0.015335
Anxiety disorder: max prob = 0.011675
Signature 8: 28 diseases
Top diseases:
Postmenopausal bleeding: max prob = 0.013810
Polyp of corpus uteri: max prob = 0.008385
Prolapse of vaginal walls: max prob = 0.006546
Uterine leiomyoma: max prob = 0.006435
Uterine/Uterovaginal prolapse: max prob = 0.005382
Signature 9: 12 diseases
Top diseases:
Back pain: max prob = 0.041376
Spondylosis without myelopathy: max prob = 0.029758
Spinal stenosis: max prob = 0.015512
Degeneration of intervertebral disc: max prob = 0.005987
Other and unspecified disc disorder: max prob = 0.005940
Signature 10: 11 diseases
Top diseases:
Cataract: max prob = 0.081245
Senile cataract: max prob = 0.075690
Macular degeneration (senile) of retina NOS: max prob = 0.024037
Glaucoma: max prob = 0.017325
Primary open angle glaucoma: max prob = 0.005277
Signature 11: 8 diseases
Top diseases:
Cerebral ischemia: max prob = 0.029134
Cerebral artery occlusion, with cerebral infarctio: max prob = 0.014716
Cerebrovascular disease: max prob = 0.010799
Occlusion of cerebral arteries: max prob = 0.007988
Hemiplegia: max prob = 0.006357
Signature 12: 7 diseases
Top diseases:
Hydronephrosis: max prob = 0.004662
Calculus of kidney: max prob = 0.001928
Other disorders of the kidney and ureters: max prob = 0.001891
Pyelonephritis: max prob = 0.001568
Stricture/obstruction of ureter: max prob = 0.001508
Signature 13: 13 diseases
Top diseases:
Hyperplasia of prostate: max prob = 0.072676
Cancer of prostate: max prob = 0.032335
Other disorders of bladder: max prob = 0.017884
Urinary incontinence: max prob = 0.004599
Urethral stricture (not specified as infectious): max prob = 0.003764
Signature 14: 10 diseases
Top diseases:
Pneumococcal pneumonia: max prob = 0.039048
Chronic airway obstruction: max prob = 0.031798
Pneumonia: max prob = 0.028185
Tobacco use disorder: max prob = 0.025296
Obstructive chronic bronchitis: max prob = 0.013956
Signature 15: 5 diseases
Top diseases:
Type 2 diabetes: max prob = 0.096006
Type 2 diabetes with ophthalmic manifestations: max prob = 0.008435
Hypoglycemia: max prob = 0.006078
Diabetic retinopathy: max prob = 0.004421
Type 1 diabetes: max prob = 0.001710
Signature 16: 29 diseases
Top diseases:
Acute renal failure: max prob = 0.027081
Constipation: max prob = 0.021872
Urinary tract infection: max prob = 0.021260
Other anemias: max prob = 0.015805
Hypovolemia: max prob = 0.012096
Signature 17: 17 diseases
Top diseases:
Diverticulosis: max prob = 0.069188
Benign neoplasm of colon: max prob = 0.040037
Hemorrhoids: max prob = 0.029815
Hemorrhage of gastrointestinal tract: max prob = 0.012788
Anal and rectal polyp: max prob = 0.009284
Signature 18: 9 diseases
Top diseases:
Cholelithiasis: max prob = 0.011105
Peritoneal adhesions (postoperative) (postinfectio: max prob = 0.006880
Cholelithiasis with other cholecystitis: max prob = 0.004658
Calculus of bile duct: max prob = 0.003002
Cholecystitis without cholelithiasis: max prob = 0.002303
Signature 19: 23 diseases
Top diseases:
Chronic Kidney Disease, Stage III: max prob = 0.043541
Other non-epithelial cancer of skin: max prob = 0.035740
Chronic renal failure [CKD]: max prob = 0.022031
Malignant neoplasm of female breast: max prob = 0.011394
Disorder of skin and subcutaneous tissue NOS: max prob = 0.008837
%run /Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/plot_aou_all_signatures.py
Number of signatures: 20 Number of diseases: 348 Found 25 batch files Loading aou_model_batch_0_0_10000.pt... Loading aou_model_batch_10_100000_110000.pt... Loading aou_model_batch_11_110000_120000.pt... Loading aou_model_batch_12_120000_130000.pt... Loading aou_model_batch_13_130000_140000.pt... Loading aou_model_batch_14_140000_150000.pt... Loading aou_model_batch_15_150000_160000.pt... Loading aou_model_batch_16_160000_170000.pt... Loading aou_model_batch_17_170000_180000.pt... Loading aou_model_batch_18_180000_190000.pt... Loading aou_model_batch_19_190000_200000.pt... Loading aou_model_batch_1_10000_20000.pt... Loading aou_model_batch_20_200000_210000.pt... Loading aou_model_batch_21_210000_220000.pt... Loading aou_model_batch_22_220000_230000.pt... Loading aou_model_batch_23_230000_240000.pt... Loading aou_model_batch_24_240000_243303.pt... Loading aou_model_batch_2_20000_30000.pt... Loading aou_model_batch_3_30000_40000.pt... Loading aou_model_batch_4_40000_50000.pt... Loading aou_model_batch_5_50000_60000.pt... Loading aou_model_batch_6_60000_70000.pt... Loading aou_model_batch_7_70000_80000.pt... Loading aou_model_batch_8_80000_90000.pt... Loading aou_model_batch_9_90000_100000.pt... Stacked phi shape: (25, 21, 348, 52) Number of batches: 25 ✓ Saved plot to: /Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/results/paper_figs/supp/aou_all_signatures_probabilities.pdf
============================================================
SIGNATURE SUMMARY
============================================================
Signature 0: 4 diseases
Top diseases:
Lump or mass in breast: max prob = 0.029428
Breast cancer [female]: max prob = 0.015682
Malignant neoplasm of female breast: max prob = 0.015309
Benign neoplasm of breast: max prob = 0.003141
Signature 1: 38 diseases
Top diseases:
Other anemias: max prob = 0.206598
Abdominal pain: max prob = 0.168872
Constipation: max prob = 0.119496
Acute renal failure: max prob = 0.094505
Anxiety disorder: max prob = 0.085695
Signature 2: 10 diseases
Top diseases:
Other symptoms/disorders or the urinary system: max prob = 0.432728
Urinary tract infection: max prob = 0.146930
Urinary incontinence: max prob = 0.123471
Other disorders of bladder: max prob = 0.073084
Cystitis: max prob = 0.063057
Signature 3: 18 diseases
Top diseases:
Sleep disorders: max prob = 0.188295
Acute upper respiratory infections of multiple or : max prob = 0.119846
Allergic rhinitis: max prob = 0.088958
Other upper respiratory disease: max prob = 0.072703
Chronic sinusitis: max prob = 0.063631
Signature 4: 8 diseases
Top diseases:
Cerebrovascular disease: max prob = 0.294064
Occlusion and stenosis of precerebral arteries: max prob = 0.083888
Cerebral ischemia: max prob = 0.083498
Occlusion of cerebral arteries: max prob = 0.047901
Cerebral artery occlusion, with cerebral infarctio: max prob = 0.044513
Signature 5: 67 diseases
Top diseases:
Hearing loss: max prob = 0.144751
Osteoporosis NOS: max prob = 0.071273
Hyperplasia of prostate: max prob = 0.070148
Hypothyroidism NOS: max prob = 0.066435
Other tests: max prob = 0.046753
Signature 6: 20 diseases
Top diseases:
Pain in joint: max prob = 0.400630
Osteoarthritis; localized: max prob = 0.347266
Osteoarthrosis, localized, primary: max prob = 0.318227
Pain in limb: max prob = 0.207642
Osteoarthrosis NOS: max prob = 0.139818
Signature 7: 23 diseases
Top diseases:
Disorders of menstruation and other abnormal bleed: max prob = 0.047544
Irregular menstrual cycle/bleeding: max prob = 0.041672
Postmenopausal bleeding: max prob = 0.032444
Uterine leiomyoma: max prob = 0.026958
Pain and other symptoms associated with female gen: max prob = 0.024331
Signature 8: 19 diseases
Top diseases:
Cataract: max prob = 0.253236
Senile cataract: max prob = 0.169537
Other retinal disorders: max prob = 0.110244
Disorders of lacrimal system: max prob = 0.084073
Disorders of vitreous body: max prob = 0.070963
Signature 9: 17 diseases
Top diseases:
Spondylosis and allied disorders: max prob = 0.244611
Spondylosis without myelopathy: max prob = 0.235041
Back pain: max prob = 0.217544
Spinal stenosis: max prob = 0.134888
Degeneration of intervertebral disc: max prob = 0.107438
Signature 10: 6 diseases
Top diseases:
Cholelithiasis: max prob = 0.022137
Cholecystitis without cholelithiasis: max prob = 0.005958
Other disorders of gallbladder: max prob = 0.005747
Calculus of bile duct: max prob = 0.005147
Cholelithiasis with other cholecystitis: max prob = 0.003517
Signature 11: 8 diseases
Top diseases:
Secondary malignant neoplasm: max prob = 0.018070
Malignant neoplasm, other: max prob = 0.010156
Cancer of bronchus; lung: max prob = 0.007949
Secondary malignancy of lymph nodes: max prob = 0.006827
Secondary malignancy of bone: max prob = 0.005030
Signature 12: 36 diseases
Top diseases:
Esophagitis, GERD and related diseases: max prob = 0.243377
GERD: max prob = 0.220191
Diverticulosis: max prob = 0.127060
Benign neoplasm of colon: max prob = 0.098380
Hemorrhoids: max prob = 0.087460
Signature 13: 13 diseases
Top diseases:
Seborrheic keratosis: max prob = 0.186705
Actinic keratosis: max prob = 0.127657
Benign neoplasm of skin: max prob = 0.113026
Other non-epithelial cancer of skin: max prob = 0.093619
Atopic/contact dermatitis due to other or unspecif: max prob = 0.089617
Signature 14: 13 diseases
Top diseases:
Essential hypertension: max prob = 0.543063
Hyperlipidemia: max prob = 0.498132
Chronic renal failure [CKD]: max prob = 0.124045
Hypercholesterolemia: max prob = 0.110824
Type 2 diabetes: max prob = 0.102795
Signature 15: 8 diseases
Top diseases:
Other disorders of the kidney and ureters: max prob = 0.221466
Cyst of kidney, acquired: max prob = 0.062327
Calculus of kidney: max prob = 0.038657
Stricture/obstruction of ureter: max prob = 0.030126
Hydronephrosis: max prob = 0.016809
Signature 16: 21 diseases
Top diseases:
Coronary atherosclerosis: max prob = 0.201959
Atrial fibrillation and flutter: max prob = 0.141548
Congestive heart failure (CHF) NOS: max prob = 0.065019
Angina pectoris: max prob = 0.055934
Rheumatic disease of the heart valves: max prob = 0.053720
Signature 17: 4 diseases
Top diseases:
Acquired toe deformities: max prob = 0.024934
Hammer toe (acquired): max prob = 0.018264
Hallux valgus (Bunion): max prob = 0.017298
Hallux rigidus: max prob = 0.003579
Signature 18: 3 diseases
Top diseases:
Other disorders of male genital organs: max prob = 0.007681
Other disorders of testis: max prob = 0.004539
Hydrocele: max prob = 0.003411
Signature 19: 12 diseases
Top diseases:
Chronic airway obstruction: max prob = 0.143445
Pneumonia: max prob = 0.127980
Pulmonary collapse; interstitial and compensatory : max prob = 0.067627
Pleurisy; pleural effusion: max prob = 0.060522
Respiratory failure: max prob = 0.051845
%run /Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/plot_mgb_all_signatures.py
Loading MGB model from: /Users/sarahurbut/aladynoulli2/mgb_model_initialized.pt MGB Model Info: Number of signatures: 21 Number of diseases: 346 Number of timepoints: 51 Phi shape: (21, 346, 51) ✓ Saved plot to: /Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/results/paper_figs/supp/mgb_all_signatures_probabilities.pdf
============================================================
SIGNATURE SUMMARY
============================================================
Signature 0: 6 diseases
Top diseases:
Allergic rhinitis: max prob = 0.194667
Other upper respiratory disease: max prob = 0.139208
Chronic sinusitis: max prob = 0.086113
Chronic pharyngitis and nasopharyngitis: max prob = 0.084324
Septal Deviations/Turbinate Hypertrophy: max prob = 0.059923
Signature 1: 28 diseases
Top diseases:
GERD: max prob = 0.359364
Anxiety disorder: max prob = 0.192219
Constipation: max prob = 0.189788
Angina pectoris: max prob = 0.158121
Major depressive disorder: max prob = 0.155168
Signature 2: 24 diseases
Top diseases:
Pain in joint: max prob = 0.584942
Osteoarthrosis, localized, primary: max prob = 0.352947
Pain in limb: max prob = 0.295932
Arthropathy NOS: max prob = 0.230406
Osteoarthrosis NOS: max prob = 0.230329
Signature 3: 11 diseases
Top diseases:
Other disorders of bone and cartilage: max prob = 0.387234
Osteoporosis NOS: max prob = 0.336999
Cyst of kidney, acquired: max prob = 0.123913
Calculus of kidney: max prob = 0.091491
Lump or mass in breast: max prob = 0.063974
Signature 4: 24 diseases
Top diseases:
Other anemias: max prob = 0.433596
Acute renal failure: max prob = 0.261868
Hypovolemia: max prob = 0.180752
Hyposmolality and/or hyponatremia: max prob = 0.174598
Hypotension NOS: max prob = 0.166550
Signature 5: 20 diseases
Top diseases:
Coronary atherosclerosis: max prob = 0.433287
Arrhythmia (cardiac) NOS: max prob = 0.411315
Mitral valve disease: max prob = 0.163696
Heart failure NOS: max prob = 0.147981
Myocardial infarction: max prob = 0.115983
Signature 6: 15 diseases
Top diseases:
Essential hypertension: max prob = 0.718604
Hyperlipidemia: max prob = 0.646927
Hypercholesterolemia: max prob = 0.282516
Type 2 diabetes: max prob = 0.156797
Hypertensive chronic kidney disease: max prob = 0.155512
Signature 7: 16 diseases
Top diseases:
Other disorders of stomach and duodenum: max prob = 0.191542
Diaphragmatic hernia: max prob = 0.140030
Hemorrhage of gastrointestinal tract: max prob = 0.103782
Esophageal bleeding (varices/hemorrhage): max prob = 0.089892
Reflux esophagitis: max prob = 0.075354
Signature 8: 19 diseases
Top diseases:
Diverticulosis: max prob = 0.397814
Benign neoplasm of colon: max prob = 0.377744
Hemorrhoids: max prob = 0.238933
Other disorders of intestine: max prob = 0.103568
Hemorrhage of rectum and anus: max prob = 0.056652
Signature 9: 13 diseases
Top diseases:
Spondylosis without myelopathy: max prob = 0.417103
Back pain: max prob = 0.363663
Spinal stenosis: max prob = 0.319552
Neuralgia, neuritis, and radiculitis NOS: max prob = 0.203076
Degeneration of intervertebral disc: max prob = 0.172247
Signature 10: 7 diseases
Top diseases:
Occlusion and stenosis of precerebral arteries: max prob = 0.250653
Occlusion of cerebral arteries: max prob = 0.193089
Cerebral artery occlusion, with cerebral infarctio: max prob = 0.133621
Transient cerebral ischemia: max prob = 0.092226
Cerebrovascular disease: max prob = 0.077111
Signature 11: 7 diseases
Top diseases:
Malignant neoplasm, other: max prob = 0.030744
Secondary malignant neoplasm: max prob = 0.030538
Secondary malignancy of lymph nodes: max prob = 0.021778
Secondary malignancy of respiratory organs: max prob = 0.014536
Secondary malignancy of bone: max prob = 0.014412
Signature 12: 68 diseases
Top diseases:
Hearing loss: max prob = 0.145443
Hypothyroidism NOS: max prob = 0.116772
Disorders of calcium/phosphorus metabolism: max prob = 0.038185
Neoplasm of uncertain behavior: max prob = 0.038139
Osteopenia or other disorder of bone and cartilage: max prob = 0.032239
Signature 13: 7 diseases
Top diseases:
Cholelithiasis: max prob = 0.028710
Other disorders of gallbladder: max prob = 0.014672
Other disorders of biliary tract: max prob = 0.010990
Calculus of bile duct: max prob = 0.008653
Cholecystitis without cholelithiasis: max prob = 0.006446
Signature 14: 13 diseases
Top diseases:
Pulmonary collapse; interstitial and compensatory : max prob = 0.314121
Pleurisy; pleural effusion: max prob = 0.253552
Pneumonia: max prob = 0.222463
Chronic airway obstruction: max prob = 0.107440
Respiratory failure: max prob = 0.091213
Signature 15: 12 diseases
Top diseases:
Seborrheic keratosis: max prob = 0.365774
Actinic keratosis: max prob = 0.309453
Benign neoplasm of skin: max prob = 0.287626
Disorder of skin and subcutaneous tissue NOS: max prob = 0.218130
Other non-epithelial cancer of skin: max prob = 0.169486
Signature 16: 16 diseases
Top diseases:
Senile cataract: max prob = 0.237880
Cataract: max prob = 0.201631
Macular degeneration (senile) of retina NOS: max prob = 0.125817
Glaucoma: max prob = 0.115289
Disorders of vitreous body: max prob = 0.108282
Signature 17: 8 diseases
Top diseases:
Urinary tract infection: max prob = 0.380519
Urinary incontinence: max prob = 0.226799
Other disorders of bladder: max prob = 0.174585
Postmenopausal atrophic vaginitis: max prob = 0.061724
Functional disorders of bladder: max prob = 0.022796
Signature 18: 12 diseases
Top diseases:
Hyperplasia of prostate: max prob = 0.359122
Cancer of prostate: max prob = 0.160336
Stricture/obstruction of ureter: max prob = 0.088360
Inguinal hernia: max prob = 0.051144
Other disorders of male genital organs: max prob = 0.032960
Signature 19: 20 diseases
Top diseases:
Uterine leiomyoma: max prob = 0.021206
Postmenopausal bleeding: max prob = 0.019076
Disorders of menstruation and other abnormal bleed: max prob = 0.017812
Infertility, female: max prob = 0.010714
Irregular menstrual cycle: max prob = 0.010249